The RecordLinkage Package: Detecting Errors in Data

نویسنده

  • Andreas Borg
چکیده

Record linkage deals with detecting homonyms and mainly synonyms in data. The package RecordLinkage provides means to perform and evaluate different record linkage methods. A stochastic framework is implemented which calculates weights through an EM algorithm. The determination of the necessary thresholds in this model can be achieved by tools of extreme value theory. Furthermore, machine learning methods are utilized, including decision trees (rpart), bootstrap aggregating (bagging), ada boost (ada), neural nets (nnet) and support vector machines (svm). The generation of record pairs and comparison patterns from single data items are provided as well. Comparison patterns can be chosen to be binary or based on some string metrics. In order to reduce computation time and memory usage, blocking can be used. Future development will concentrate on additional and refined methods, performance improvements and input/output facilities needed for real-world application.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words

This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...

متن کامل

Detecting Bot Networks Based On HTTP And TLS Traffic Analysis

Abstract— Bot networks are a serious threat to cyber security, whose destructive behavior affects network performance directly. Detecting of infected HTTP communications is a big challenge because infected HTTP connections are clearly merged with other types of HTTP traffic. Cybercriminals prefer to use the web as a communication environment to launch application layer attacks and secretly enga...

متن کامل

درک پرستاران از علل خطاهای دارویی: یک مطالعه کیفی

  Background & Aim: Medication errors are known as the most common preventable and life threatening medical errors. This study aimed to explore perceptions of nurses on medication errors .   Methods & Materials: This was a qualitative study with content analysis approach. Seventeen nurses were selected purposefully from the intensive care units of Shohada hospital in khoramabad in 2012. Data...

متن کامل

Explanation of Residents' Experiences Concerning Medication Errors in Neonatal Intensive Care Units: A Qualitative Study

Introduction: Medication errors are a potentially hazardous accident for the patients and can be used as a measure of patient safety in the healthcare system. Neonates are the most vulnerable population because of their body size. The experiences and views of those involved in the healthcare system can be a significant source of information gathering and planning in preventing medication errors...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010